Important Molecular Descriptors Selection Using Self Tuned Reweighted Sampling Method for Prediction of Antituberculosis Activity

نویسندگان

  • Doreswamy
  • Chanabasayya M. Vastrad
چکیده

In this paper, a new descriptor selection method for selecting an optimal combination of important descriptors of sulfonamide derivatives data, named self tuned reweighted sampling (STRS), is developed. descriptors are defined as the descriptors with large absolute coefficients in a multivariate linear regression model such as partial least squares(PLS). In this study , the absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each descriptor Then, based on the importance level of each descriptor, STRS sequentially selects N subsets of descriptors from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each sampling run, a fixed ratio (e.g. 80%) of samples is first randomly selected to establish a regresson model. Next, based on the regression coefficients, a two-step procedure including rapidly decreasing function (RDF) based enforced descriptor selection and self tuned sampling (STS) based competitive descriptor selection is adopted to select the important descriptorss. After running the loops, a number of subsets of descriptors are obtained and root mean squared error of cross validation (RMSECV) of PLS models established with subsets of descriptors is computed. The subset of descriptors with the lowest RMSECV is considered as the optimal descriptor subset. The performance of the proposed algorithm is evaluated by sulfanomide derivative dataset. The results reveal an good characteristic of STRS that it can usually locate an optimal combination of some important descriptors which are interpretable to the biologically of interest. Additionally, our study shows that better prediction is obtained by STRS when compared to full descriptor set PLS modeling, Monte Carlo uninformative variable elimination (MC-UVE). Compared to the partial least squares regression models based on full descriptor set and descriptors selected by MC-UVE, the performance of STRS with PLS model was better, with higher determination coefficient for test ( r ) of 0.8758 , and lower root mean square error of prediction of 0.1676. Based on the results, it was concluded that Sulfonamide with STRS methods seem to be a rapid and effective alternative to the classical methods for the prediction of antituberculosis activity. Keywords— MC-UVE,PLS,RDF,TRS, Number of Principal factors, RMSEP ,RMSECV

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Priori Prediction of Tissue: Plasma Partition Coefficients (Log BP) of Drugs to Facilitate the Use of MLR and MLR-GA Methods

It is important to determine whether a candidate molecule is capable of penetrating the plasma-brain barrier indrug discovery and development. The aim of this paper is to establish a predictive model for plasma-brainbarrier penetration using simple descriptors The usefulness of the quantum chemical descriptors, calculated atthe level of the DFT and HE theories using 6-310* basis set for QSAR st...

متن کامل

Pixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins

Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...

متن کامل

In-silico prediction of Cellular Responses to Polymeric Biomaterials from Their Molecular Descriptors

In this work quantitative structure activity relationship (QSAR) methodology was applied for modeling and prediction of cellular response to polymers that have been designed for tissue engineering. After calculation and screening of molecular descriptors, linear and nonlinear models were developed by using multiple linear regressions (MLR) and artificial neural network (ANN) methods. The root m...

متن کامل

Pixel selection by successive projections algorithm method in multivariate image analysis for a QSAR study of antimicrobial activity for cephalosporins and design new cephalosporins

Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. B...

متن کامل

Quantitative structure-activity relationship (QSAR) study of CCR2b receptor inhibitors using SW-MLR and GA-MLR approaches

In this paper, the quantitative structure activity-relationship (QSAR) of the CCR2b receptor inhibitors was scrutinized. Firstly, the molecular descriptors were calculated using the Dragon package. Then, the stepwise multiple linear regressions (SW-MLR) and the genetic algorithm multiple linear regressions (GA-MLR) variable selection methods were subsequently employed to select and implement th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1402.5360  شماره 

صفحات  -

تاریخ انتشار 2013